Sub-linear Queries Statistical Databases: Privacy with Power
نویسنده
چکیده
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true response is ∑ r∈S f(DBr), a noisy version of which is released. Results in [3, 4] show that a strong form of privacy can be maintained using a surprisingly small amount of noise, provided the total number of queries is sublinear in the number n of database rows. We call this a sub-linear queries (SuLQ) database. The assumption of sublinearity becomes reasonable as databases grow increasingly large. The SuLQ primitive – query and noisy reply – gives rise to a calculus of noisy computation. After reviewing some results of [4] on multi-attribute SuLQ, we illustrate the power of the SuLQ primitive with three examples [2]: principal component analysis, k means clustering, and learning in the statistical queries learning model.
منابع مشابه
Privacy-Preserving Datamining on Vertically Partitioned Databases
In a recent paper Dinur and Nissim considered a statistical database in which a trusted database administrator monitors queries and introduces noise to the responses with the goal of maintaining data privacy [5]. Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noi...
متن کاملA Method for Protecting Access Pattern in Outsourced Data
Protecting the information access pattern, which means preventing the disclosure of data and structural details of databases, is very important in working with data, especially in the cases of outsourced databases and databases with Internet access. The protection of the information access pattern indicates that mere data confidentiality is not sufficient and the privacy of queries and accesses...
متن کاملSensitivity of Counting Queries
In the context of statistical databases, the release of accurate statistical information about the collected data often puts at risk the privacy of the individual contributors. The goal of differential privacy is to maximise the utility of a query while protecting the individual records in the database. A natural way to achieve differential privacy is to add statistical noise to the result of t...
متن کاملBroadening the Scope of Differential Privacy Using Metrics
Differential Privacy is one of the most prominent frameworks used to deal with disclosure prevention in statistical databases. It provides a formal privacy guarantee, ensuring that sensitive information relative to individuals cannot be easily inferred by disclosing answers to aggregate queries. If two databases are adjacent, i.e. differ only for an individual, then the query should not allow t...
متن کاملThe Trade-off between Privacy and Fidelity via Ehrhart Theory
As an increasing amount of data is gathered nowadays and stored in databases, the question arises of how to protect the privacy of individual records in a database even while providing accurate answers to queries on the database. Differential Privacy (DP) has gained acceptance as a framework to quantify vulnerability of algorithms to privacy breaches. We consider the problem of how to sanitize ...
متن کامل